Update the course material for this week

To update the course repository on your computer, you will pull a current copy of the repository. To do this:

  1. Open your terminal/bash.

  2. Navigate to the course repository. If this is in your root directory then type:

    cd
    cd EEMB595TE
  3. Paste the following into the terminal/bash:

    git pull


Last weeks objectives

  1. To learn the general principles of what occurs mathematically when you indicate a random effect

  2. To learn how to create a state-space model


This weeks objectives

  1. Learn the differences between types of mark-recapture models


REVISED: Course schedule

Week Date Subject Book.chapters
1 5-Apr Course outline & Github
2 12-Apr Bayesian vs. Frequentist Chpt 1,2
3 19-Apr GLM in R and JAGS Chpt 3
4 26-Apr Random effects & State-space models Chpt 4, 5
5 3-May Random effects & State-space models Chpt 4, 5
6 10-May Analyzing mark-recapture data (JS models) Chpt 10
7 17-May Occupancy & N-mixture models Chpt 13
8 24-May Student presentations
9 31-May Student presentations
10 7-Jun Student presentations

Everyone should have their presentations/code ready for 24 May 2018 Either put it in a google drive file, drobox, box, or thumbdrive

Presentation sign up sheet

Only 3 people have signed up- please do so as soon as possible. Presentations will start in 2 weeks from today.


Example of Multi-state systems

The example listed here are not all inclusive:

  • Age structure: baby, juvenile, adult, dead

  • Disease: infected, not infected, dead

  • Movement: location A, location B, dead

  • Sexual maturity:

  • Fruiting adult, not fruiting adult, dead

  • Breeding adult, not breeding adult, dead

  • Plant state: vegetative, flowering, dormant

Types of Multi-state Mark-recapture models

These are the two most popular mark-recapture models are:

  • Cormack-Jolly-Seber Model (CJS)

  • Jolly-Seber Model (JS)

There are others:

  • But most of them stem from variations of these models

What parameters can you estimate using these two models?

Cormack-Jolly-Seber Model Jolly-Seber
Survival (\(\phi_{1}\)) Survival (\(\phi_{1}\))
Survival (\(\phi_{2}\)) Survival (\(\phi_{2}\))
Transistion probability (\(\psi_{1,2}\)) Transistion probability (\(\psi_{1,2}\))
Transistion probability (\(\psi_{2,1}\)) Transistion probability (\(\psi_{2,1}\))
Recruitment (\(\gamma_{1}\))
Recruitment (\(\gamma_{2}\))
Abundance (N)

What is the main difference between the CJS and JS models?

The Cormack-Jolly-Seber model disregards the data prior to the first encounter:

  • So it can not estimate recruitment

  • y = [0, 0, 0, 1, 1, 0, 0]

  • The model will start using information from the fourth season when the individual was captured (0 = not seen; 1 = seen)

The Jolly-Seber model uses the entire capture history:

  • So it can not estimate recruitment

  • y = [0, 0, 0, 1, 1, 0, 0]

  • The model will take advantage of the entire capture history- starting with the first season

    • It needs to determine was the organism present at the site and not seen (false negative) and if so- was this when it was recruited.


Model assumptions

  1. Cormack-Jolly-Seber Model
  • The transition and observation probabilities must be the same for all individuals at a given occasion and state

  • Individuals must be independent from one another

  • Individuals and states are recorded without error

    • If this is the case, look into multi-event models
  • No marks are lost

  1. Jolly-Seber Model
  • Same as the Cormack-Jolly-Seber model plus 2 more

  • The captured individuals are a random sample of the population

    • i.e., there is no trapping method does not affect the probability of recapturing
  • All individuals (marked and unmarked) in the population have the same capture probability

    • This is met when the same protocol is used each season


Collecting data for animals that could be in one of 2 states

  • State process: Unobserved (truth)

  • Observed process: Observed (with sampling error)

  • Where y = 1 (seen) and y = 0 (not seen)

  • The observations for one individual would be:

  • y = [1, 1, 0, 1, 0, 0, 0]

  • We need to determine if the individual for the last 3 sampling periods was dead or alive and not seen

  • We monitor several individuals so the data we collect will look like this:

##    Season1 Season2 Season3 Season4 Season5
## 1        1       0       1       0       0
## 2        1       0       1       1       1
## 3        0       0       1       0       0
## 4        0       0       1       1       1
## 5        0       1       1       0       0
## 6        1       1       1       1       1
## 7        0       0       1       0       0
## 8        0       1       1       1       0
## 9        0       0       0       1       0
## 10       0       1       0       1       0


Collecting data for animals that could be in one of 3 states

  • State process: Unobserved (truth)

  • Observed process: Observed (with sampling error)

  • Where y = 1 (seen in state 1), y = 2 (seen in state 2), and y = 0 (not seen)

  • You define and categorize what does state 1 equal and what does state 2 equal

  • The observations for one individual would be:

  • y = [1, 2, 0, 1, 0, 0, 0]

  • We need to determine if the individual for the last 3 sampling periods was dead or alive and not seen

  • We monitor several individuals so the data we collect will look like this:

##    Season1 Season2 Season3 Season4 Season5
## 1        1       1       0       1       2
## 2        1       0       0       0       0
## 3        0       1       2       1       1
## 4        1       0       0       0       1
## 5        1       1       0       2       0
## 6        0       0       1       0       0
## 7        0       0       1       0       1
## 8        0       2       0       1       1
## 9        2       2       0       1       0
## 10       1       0       2       0       1

Working exampling

We are interested in disease state of the chytrid fungus (infected or not infected) for amphibians.

Data collection methods:

  • We go out 1 night for 10 years

  • We give individuals unique IDs (using toe clipping method)

  • We collect skin swab samples from individuals to determine in the lab the disease state

Model outline

We denote the true but, unknowable, disease state of individual i at occasion t as \(z_{i,t}\), where \(z_{i,t} = 1\ldots4\) and represents the states “not entered”, “uninfected”, “infected”, or “dead”, respectively. To estimate monthly survival \((\phi)\), recruitment \((\gamma)\), and transition \((\beta)\) rates, we used the transition matrix \(\psi\), where the cells represent the probabilities of moving from a row state to a column state from time t-1 to time t for individual i.



The “not entered” category consists of individuals that are not part of the population yet, where the parameters \(\gamma_{2}\) and \(\gamma_{3}\) are the state-specific entry probabilities for uninfected and infected hosts from t-1 to t, i.e., the probability that an individual in state i enters the population at time t. The parameter \(\phi_{2}\) and \(\phi_{3}\) are the state-specific survival probability for uninfected and infected hosts from t-1 to t, while the parameters \(\beta_{UI}\) and \(\beta_{IU}\) are the infection and recovery probabilities, respectively. In other words, if the individual i survives from t-1 to t, it can become infected if they were uninfected at t-1, or recover from infection if there were infected at time t-1, with probabilities \(\beta_{UI}\) and \(\beta_{IU}\). Each parameter must be between [0,1] and each group of parameters (recruitment, survival, and transition) must sum to 1. For survival and transition this is easy. However, for recruitment this is less straightforward. One solution is to choose a Dirichlet prior for \([\gamma_{1}, \gamma_{2}, \gamma_{3}]\) to ensure that these parameters sum to 1.

Because there are more than two possible true and observed states, we use the categorical distribution to model the transition from one state to another for individual i each time step:

\[z_{i,t} \mid z_{i,t-1} \sim \textrm{categorical}(\psi_{z_{i,t-1},1:4}),\]

where \(\psi\) is the transition matrix we just defined. Note that the argument of the categorical distribution is a vector of length 4; that is, it is the row of the matrix \(\psi\) corresponding to the state of individual i in time step t-1.

To estimate host recapture probabilities, we use a second transition matrix \(\pi\), which determines the probabilities of the possible observation outcomes (columns) for the true state of each captured individual (rows). We do not assume partial observations or state misclassifications and the observed states are “seen uninfected”, “seen infected”, and “not seen”.



We use the categorical to model the observed state \(y_{i,t}\) as a function of the true state:

\[y_{i,t} \mid z_{i,t} \sim \textrm{categorical}(\pi_{z_{i,t},1:3}),\]

where \(\pi\) is the observation matrix we just defined.. Similarly as before, the argument of the categorical distribution is a vector of length 3; that is, it is the row of the matrix \(\pi\) corresponding to the true state of individual i in time step t.


Import our data and examine the data

CH <- read.csv("/Users/Cici/EEMB595TE/6_Multi-state_Mark_Recapture_Models/Data/data_amphibian.csv")[,-1]

Look at the structure of the data and make sure it was imported correctly

str(CH)
## 'data.frame':    181 obs. of  5 variables:
##  $ V1: int  0 1 1 0 1 0 1 1 1 0 ...
##  $ V2: int  1 0 2 0 2 1 0 0 0 2 ...
##  $ V3: int  1 0 0 0 0 0 0 2 0 0 ...
##  $ V4: int  0 0 0 1 0 1 0 0 0 0 ...
##  $ V5: int  1 0 0 2 0 1 0 0 0 0 ...

Remember our observations: - 0 = not seen - 1 = seen as uninfected - 2 = seen as infected

head(CH)
##   V1 V2 V3 V4 V5
## 1  0  1  1  0  1
## 2  1  0  0  0  0
## 3  1  2  0  0  0
## 4  0  0  0  1  2
## 5  1  2  0  0  0
## 6  0  1  0  1  1

JAGS model

** Note that this code has a Bayesian posterior predictive check code and we will ignore it for now **

Quick note the Bayesian posteriod predictive check or also known as the Bayesian p-value is used to evaluate how well does your data match predictions from the model you created. Values near 0.5 = well fitting model; whereas values near 0.95 and 0.05 = poor fitting model

{
sink("model_JS.txt")
cat("

model {
    
#--------------------------------------
# Parameters:
  # phi_U: survival probability of uninfected
  # phi_I: survival probability of infected
  # gamma_U: removal entry probability of uninfected
  # gamma_I: removal entry probability of infected
  # p_U: capture probability of uninfected
  # p_I: capture probability of infected
#--------------------------------------
# States (S):
  # 1 not yet entered
  # 2 uninfected
  # 3 infected
  # 4 dead
# Observations (O):
  # 1 seen as uninfected
  # 2 seen as infected
  # 3 not seen
#--------------------------------------
    
# Priors and constraints

phi_U ~ dunif(0, 1)   # Prior for mean survival
phi_I ~ dunif(0, 1)   # Prior for mean survival

for(t in 1:(K-1)){
  gamma_U[t] ~ dunif(0, 1)   # Prior for entry probabilities
  gamma_I[t] ~ dunif(0, 1)   # Prior for entry probabilities
}

p_U ~ dunif(0, 1)     # Prior for mean capture
p_I ~ dunif(0, 1)     # Prior for mean capture

beta_UI ~ dunif(0, 1)
beta_IU ~ dunif(0, 1)

# Define state and observation matricies
for(i in 1:M){  
    for(t in 1:(K-1)){
      # Define probabilities of state S(t+1) given S(t)
      # This is the psi matrix from above

        ps[1,i,t,1] <- 1 - gamma_U[t] - gamma_I[t]
        ps[1,i,t,2] <- gamma_U[t]
        ps[1,i,t,3] <- gamma_I[t]
        ps[1,i,t,4] <- 0

        ps[2,i,t,1] <- 0
        ps[2,i,t,2] <- phi_U * (1 - beta_UI)
        ps[2,i,t,3] <- phi_U * beta_UI
        ps[2,i,t,4] <- 1 - phi_U

        ps[3,i,t,1] <- 0
        ps[3,i,t,2] <- phi_I * beta_IU
        ps[3,i,t,3] <- phi_I * (1 - beta_IU)
        ps[3,i,t,4] <- 1 - phi_I

        ps[4,i,t,1] <- 0
        ps[4,i,t,2] <- 0
        ps[4,i,t,3] <- 0
        ps[4,i,t,4] <- 1

# This is the pi matrix from above

        po[1,i,t,1] <- 0
        po[1,i,t,2] <- 0
        po[1,i,t,3] <- 1

        po[2,i,t,1] <- p_U
        po[2,i,t,2] <- 0
        po[2,i,t,3] <- 1 - p_U

        po[3,i,t,1] <- 0   
        po[3,i,t,2] <- p_I 
        po[3,i,t,3] <- 1 - p_I

        po[4,i,t,1] <- 0
        po[4,i,t,2] <- 0
        po[4,i,t,3] <- 1
    } #t
} #i

#------ Likelihood model

for (i in 1:M){
  # Define latent state at first occasion
    z[i, 1] <- 1   
    # Make sure that all M individuals are in state 1 at t = 1; not entered
  
  for (t in 2:K){
    # State process: draw S(t) given S(t-1)
      z[i,t] ~ dcat(ps[z[i,t-1], i, t-1, ])

    # Observation process: draw O(t) given S(t)
      y[i,t] ~ dcat(po[z[i,t], i, t-1, ])
   }
  }


#------ Calculate derived population parameters fro recruitment

for (t in 1:(K-1)){
  qgamma_U[t] <- 1-gamma_U[t]
  qgamma_I[t] <- 1-gamma_I[t]
}

cprob_U[1] <- gamma_U[1]
cprob_I[1] <- gamma_I[1]

for (t in 2:(K-1)){
  cprob_U[t] <- gamma_U[t] * prod(qgamma_U[1:(t-1)])
  cprob_I[t] <- gamma_I[t] * prod(qgamma_I[1:(t-1)])
} #t

psi_U <- sum(cprob_U[])            # Inclusion probability
psi_I <- sum(cprob_I[])            # Inclusion probability

for (t in 1:(K-1)){
  b_U[t] <- cprob_U[t] / psi_U      # Entry probability
  b_I[t] <- cprob_I[t] / psi_I      # Entry probability
} #t

for (i in 1:M){
  for (t in 2:K){
    al_U[i,t-1] <- equals(z[i,t], 2)
    al_I[i,t-1] <- equals(z[i,t], 3)
  } #t
  for (t in 1:(K-1)){
    d_U[i,t] <- equals(z[i,t] - al_U[i,t], 0)
    d_I[i,t] <- equals(z[i,t] - al_I[i,t], 0)
  } #t   
  alive[i] <- sum(al_U[i,], al_I[i,])
} #i

for (t in 1:(K-1)){
  N_U[t] <- sum(al_U[,t])        # Actual population size
  N_I[t] <- sum(al_I[,t])        # Actual population size
  B_U[t] <- sum(d_U[,t])         # Number of entries
  B_I[t] <- sum(d_I[,t])         # Number of entries
} #t

for (i in 1:M){
  w[i] <- 1-equals(alive[i],0)
} #i

Nsuper <- sum(w[])            # Superpopulation size

#---------------- Calculate Bayesian posterior predictive check
#for(t in 1:(K-1)){
#  for(i in 1:M){
#    for(s in 1:state){
#      for(j in 1:n.surv)
#        r[s, i, t, j]     <- ifelse(y[i, t+1, j] == s, 1, 0)
#        r.new[s, i, t, j] <- ifelse(y.new[i, t+1, j] == s, 1, 0)
#      }
#    }
#  }
#}
#
#for(t in 1:(K-1)){
#  for(s in 1:state){
#    # sum across individuals in each state, each time period
#      R_state[s, t]     <- sum(r[s, , t, ])
#      R_state.new[s, t] <- sum(r.new[s, , t, ])
#  }
#}
#
#for(t in 1:(K-1)){
#  for(s in 1:state){
#    for(i in 1:M){
#      for(j in 1:n.surv)
#
#      muy[i, j, t, s] <-  ps[z[i, t], i, t, z[i, t+1]] * 
#                          po[z[i, t], i, t, j, s]
#      } 
#    }
#    PO_expt[s, t] <- sum(0.01, sum(muy[ , , t, s]))
#  }
#}
#
##--------- Posterior predictive check
#
#for(t in 1:(K-1)){
#  for(s in 1:state){
#    E.act[s, t] <- pow(pow(R_state[s, t], 0.5) - pow(PO_expt[s, t], 0.5), #2)
#    E.new[s, t] <- pow(pow(R_state.new[s, t], 0.5) - pow(PO_expt[s, t], 0#.5), 2)
#  }
#}
#
#zzz.fit <- sum(E.act[,])
#zzz.fit.new <- sum(E.new[,])

}
",fill = TRUE)
sink()
}

Bundle the data

# Analysis of the JS model as a multistate model

# Augment data approach allows us to estimate the total population size
# Idea = add a large number of 0 capture histories, and then the model will determine if those individuals were missed or do not exit using information from the detection probability and recruitment rates
nz <- 500

# Format the data with an added dummy column
# Dummy column = 1st column is all 1s
# Allows the model to calculate recrutiment into the population for the first season

CH.ms <- array(0, dim = c(dim(CH)[1]+nz, dim(CH)[2]+1))

for(i in 1:dim(CH)[1]){
  for(j in 1:dim(CH)[2]){
      CH.ms[i,j+1] <- CH[i, j]
  }
}

CH.du <- CH.ms[1:dim(CH)[1],]

# Recode CH matrix: a 0 is not allowed in JAGS!
CH.ms[CH.ms==0] <- 3    # Not seen = 3, seen = 1 or 2

# Bundle data
jags.data <- list(y = CH.ms, 
                  M = dim(CH.ms)[1],
                  K = dim(CH.ms)[2],
                  n.surv = 1,
                  state = 3
)

# Initial values
ch <- CH.du

js.multistate.init <- function(ch, nz){
# 4 state system
    # 1 = Not entered
    # 2 = uninfected
    # 3 = infected
    # 4 = Dead 
  
# 3 observation states
    # seen uninfected
    # seen infected
    # not seen

# Put an NA when an individual was not seen ( = 0 or 3)
  ch[ch==0] <- NA
  ch[ch==3] <- NA
  
  state <- ch
  colnames(state) <- 1:dim(state)[2]
  
  # When the individual is known to be alive between first and last capture fill it in with 2s
  for(i in 1:dim(ch)[1]){    # For each individual

    n1 <- min(which(ch[i,] < 3))
    n2 <- max(which(ch[i,] < 3))
    
    fill <- which(is.na(state[i,n1:n2]) == TRUE)
    state[i, names(fill)] <- 2
  }
  state <- state + 1
  
  f <- array(NA, dim = c(dim(ch)[1]))
  
    for(i in 1:dim(ch)[1]){
        f[i] <- min(which(!is.na(ch[i, ])))
    }

  l <- array(NA, dim = c(dim(ch)[1]))
  
  for(i in 1:dim(ch)[1]){
      l[i] <- max(which(!is.na(ch[i, ])))
  }

  for (i in 1:dim(ch)[1]){
# Before initial observation- not observed
    state[i,1:(f[i]-1)] <- 1
    
      # If the last time the animal was seen != the last survey date
      # Then add 1 to the survey date, and fill the rest to the last occasion with 4
      if(l[i]!= dim(ch)[2]){state[i, (l[i]+1):dim(ch)[2]] <- 4}
    
      if(ch[i, f[i]] == 1){state[i, f[i]] <- 2}
      if(ch[i, f[i]] == 2){state[i, f[i]] <- 3} 
    }
 
  state2 <- array(NA, dim = c(dim(ch)[1]+nz, dim(ch)[2]))
  state3 <- array(NA, dim = c(dim(ch)[1]+nz, dim(ch)[2]))

# Copy over the matrix you generated to the one with the right dimensions    
  for(i in 1:dim(ch)[1]){
    for(j in 1:(dim(ch)[2])){
        state2[i,j] <- state[i, j]
      }
    }

  # For all the data augmented individuals- their state == 1
  for(i in (dim(state)[1]+1):dim(state2)[1]){ 
    for(j in 1:(dim(state)[2])){ # For all occasions
        state2[i,j] <- 1
    }
  }  
  
  return(state2)
}

n.occasions <- dim(CH.ms)[2]

zinit <- js.multistate.init(CH.du, nz)

zinit <- apply(zinit, c(1, 2), max)
zinit[,1] <- NA


#--- Bundle the inits 

inits <- function(){list(phi_U = runif(1, 0.5, 1), 
                         phi_I = runif(1, 0.5, 1), 
                         
                         p_U = runif(1, 0.5, 1), 
                         p_I = runif(1, 0.5, 1), 
                         
                         beta_UI = runif(1, 0, 0.5),
                         beta_IU = runif(1, 0, 0.5),
                         
                         gamma_U = rep(runif(1, 0, 0.5), times = n.occasions-1), 
                         gamma_I = rep(runif(1, 0, 0.5), times = n.occasions-1),
                         
                         z = zinit
                         )}    

#------- Parameters monitored

params <- c("phi_U", 
            "phi_I", 
            "p_U", 
            "p_I",
            "beta_UI",
            "beta_IU",
            "gamma_U",
            "gamma_I")

#------- MCMC settings

ni <- 1000
nb <- 100
nt <- 10
nc <- 3
na <- 100

#------ call Library
library("jagsUI")
## Loading required package: lattice
## 
## Attaching package: 'jagsUI'
## The following object is masked from 'package:utils':
## 
##     View
#------- Call JAGS from R

js.ms <- jags(data = jags.data, inits = inits, parameters.to.save = params, model.file = "model_JS.txt", n.chains = nc, n.thin = nt, n.iter = ni, n.burnin = nb, n.adapt = na, parallel = TRUE)
## 
## Processing function input....... 
## 
## Done. 
##  
## Beginning parallel processing using 3 cores. Console output will be suppressed.
## 
## Parallel processing completed.
## 
## Calculating statistics....... 
## 
## Done.
print(js.ms, dig = 3)
## JAGS output for model 'model_JS.txt', generated by jagsUI.
## Estimates based on 3 chains of 1000 iterations,
## adaptation = 100 iterations (sufficient),
## burn-in = 100 iterations and thin rate = 10,
## yielding 270 total samples from the joint posterior. 
## MCMC ran in parallel for 2.58 minutes at time 2018-05-10 10:39:56.
## 
##               mean     sd    2.5%     50%   97.5% overlap0 f  Rhat n.eff
## phi_U        0.902  0.047   0.801   0.907   0.980    FALSE 1 1.037    58
## phi_I        0.821  0.046   0.725   0.822   0.901    FALSE 1 1.011   149
## p_U          0.657  0.092   0.490   0.656   0.854    FALSE 1 1.060    36
## p_I          0.578  0.059   0.467   0.574   0.697    FALSE 1 1.034    61
## beta_UI      0.419  0.065   0.292   0.415   0.554    FALSE 1 1.009   270
## beta_IU      0.100  0.035   0.043   0.096   0.183    FALSE 1 1.005   232
## gamma_U[1]   0.043  0.013   0.024   0.042   0.071    FALSE 1 1.006   270
## gamma_U[2]   0.028  0.011   0.010   0.026   0.054    FALSE 1 1.002   270
## gamma_U[3]   0.040  0.013   0.019   0.038   0.068    FALSE 1 1.030    70
## gamma_U[4]   0.060  0.015   0.036   0.059   0.093    FALSE 1 1.018   270
## gamma_U[5]   0.022  0.011   0.004   0.021   0.046    FALSE 1 1.004   270
## gamma_I[1]   0.043  0.011   0.024   0.042   0.067    FALSE 1 1.001   270
## gamma_I[2]   0.029  0.013   0.006   0.029   0.056    FALSE 1 1.025   185
## gamma_I[3]   0.031  0.014   0.005   0.030   0.061    FALSE 1 1.002   270
## gamma_I[4]   0.058  0.017   0.029   0.058   0.094    FALSE 1 1.021   118
## gamma_I[5]   0.030  0.016   0.004   0.028   0.072    FALSE 1 0.996   270
## deviance   753.818 66.771 625.875 758.226 872.954    FALSE 1 1.068    35
## 
## Successful convergence based on Rhat values (all < 1.1). 
## Rhat is the potential scale reduction factor (at convergence, Rhat=1). 
## For each parameter, n.eff is a crude measure of effective sample size. 
## 
## overlap0 checks if 0 falls in the parameter's 95% credible interval.
## f is the proportion of the posterior with the same sign as the mean;
## i.e., our confidence that the parameter is positive or negative.
## 
## DIC info: (pD = var(deviance)/2) 
## pD = 2112 and DIC = 2865.849 
## DIC is an estimate of expected predictive error (lower is better).

Check the chains

plot(js.ms)

Check the parameters from the model with the truth

The data set we used was actually simulated! We know the truth!